[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735

conroy-cheers · 2024-10-27T13:21:43Z

NVIDIA Jetson devices do not support the NVML protocol.

This PR adds a discrete CudaJetsonPlatform platform which does not rely on NVML, and is automatically used
on Jetson devices when NVML cannot be initialized. The Jetson platform supports all CUDA functionality, but does
not support NVLink; platform-specific checks are adjusted where appropriate to account for this.

Jetson Orin devices use CUDA capability 8.7 which is mostly identical to 8.6; capability 8.7 has been added to definitions
where 8.6 is used.

This has been built and tested on Jetson Orin AGX.

Fixes #9728

github-actions · 2024-10-27T13:21:55Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-10-28T08:02:54Z

I don't think you need to create another platform class.

if Jetson does not support nvlink, you can just detect Jetson and return false for the nvlink detection.

youkaichao · 2024-10-28T08:03:49Z

creating a platform just for Jetson is too overkill.

conroy-cheers · 2024-10-28T08:41:25Z

It's not just NVLink; currently that's only used for custom allreduce as far as I can tell.

The CUDA platform class assumes NVML is available throughout to avoid initialising a CUDA context, but Jetson doesn't have NVML, so a different approach needs to be taken.

I didn't want to pollute the CUDA platform by initializing a CUDA context, so created a different one.

Happy to rework if you prefer a different approach?

conroy-cheers · 2024-10-28T08:52:23Z

Alternatively, could try to abstract away the NVML calls to make calls to an alternative backend on Jetson - jetson_stats might be usable, but not sure if all features will be implemented (especially if we want to use other NVML features in future).

conroy-cheers · 2024-10-28T09:32:20Z

I believe Jetson is also currently the only CUDA-supporting device to use a unified memory configuration, so some optimisations may be possible there (e.g. handover of weights between CPU and GPU without copy).

However it could be argued to save the separate backend for when such optimisations are actually implemented.

tlrmchlsmth · 2024-10-28T13:51:49Z

Looking at the logs, adding 8.7 to the ARCHS is not affecting the binary size at all:

This PR:

[2024-10-27T13:36:44Z] #28 0.789 Wheel dist/vllm-0.6.3.post2.dev123+gda79e3e5.cu124-cp38-abi3-linux_x86_64.whl is within the allowed size (184.58 MB).

Latest main:

[2024-10-28T05:09:01Z] #28 0.878 Wheel dist/vllm-0.6.3.post2.dev125+g32176fee.cu124-cp38-abi3-linux_x86_64.whl is within the allowed size (184.58 MB).

Does this make sense?

conroy-cheers · 2024-10-28T14:43:48Z

Possibly - as far as I know 8.6 and 8.7 are essentially compatible, so it's possible it's just the same kernels being linked twice?

I did have to add the 8.7 capability to get vLLM running on Orin though; without, it'd complain that there was no compatible kernel available.

tlrmchlsmth · 2024-10-29T15:36:04Z

Possibly - as far as I know 8.6 and 8.7 are essentially compatible, so it's possible it's just the same kernels being linked twice?

I did have to add the 8.7 capability to get vLLM running on Orin though; without, it'd complain that there was no compatible kernel available.

Ok, I didn't catch this before, but this PR isn't adding the sm87 kernels to the wheel file (just adds support for compiling them). IIUC, Jetson is ARM-only and the vLLM wheels are only built for x86, so all that looks good to me for this PR

tlrmchlsmth

I leave it to @youkaichao to decide whether it makes sense to have Jetson in its own platform, but lgtm otherwise

youkaichao · 2024-10-29T20:19:17Z

I can find https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/vllm who builds vllm on jetson. I don't see any manipulation like this.

conroy-cheers · 2024-10-29T22:59:09Z

I can find https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/vllm who builds vllm on jetson. I don't see any manipulation like this.

Just investigated this. NVIDIA JetPack 6 includes NVML support, so the alternate platform will not be needed on JetPack 6.

However, JetPack 5 and older still do not support NVML, and JetPack 6 is only releasing for the Orin series, not any older Jetson devices. So if we want to support JetPack 5 devices we will need to use the alternate non-NVML approach.

conroy-cheers · 2024-11-11T06:15:47Z

@youkaichao 抱歉，您能重新审核一下吗？如果能尽快合并就太好了。谢谢

Can this branch build VLLM directly without relying on Jetson containers?

Yes, I am currently building this directly on Jetson outside of containers. Build process is fully specified by my nixpkgs branch; for Jetson support make sure to override the source hash to use this branch.

You are welcome to build it directly using Nix (I am running jetpack-nixos) or adapt to whatever build scripts you are using.

conroy-cheers · 2024-11-17T23:33:38Z

Bump, still pending review

youkaichao · 2024-11-19T08:29:45Z

vllm/platforms/__init__.py

+    import os
+
+    def cuda_is_jetson() -> bool:
+        return os.path.isfile("/etc/nv_tegra_release") \


I need to check with nvidia folks, how robust it is.

The check came from this thread:
rapidsai/dask-cuda#400 (comment)

youkaichao · 2024-11-19T08:32:26Z

vllm/platforms/cuda.py

+        context.log_warnings()
+
+
+class CudaPlatform(Platform):


we can have NvmlCudaPlatform and NonNvmlCudaPlatform inheriting from Platform, and in the end of this file, based on jetson or not, define a variable CudaPlatform to point to either NvmlCudaPlatform or NonNvmlCudaPlatform .

mergify · 2024-11-21T04:46:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @conroy-cheers.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Conroy Cheers <[email protected]>

youkaichao

LGTM now! Thanks for the fix!

youkaichao · 2024-11-25T23:04:26Z

@conroy-cheers can you please merge main to solve the conflict?

mgoin

Ditto thanks for the responses!

conroy-cheers · 2024-11-26T04:58:55Z

Merged main and retested on Jetson. Ready to go, assuming CI passes 👍

vllm/platforms/cuda.py

…port

Signed-off-by: Conroy Cheers <[email protected]>

conroy-cheers · 2024-11-26T11:59:49Z

CI seems to be failing, not sure if it's related to the changes here?

mgoin · 2024-11-26T14:34:48Z

@conroy-cheers failing test is unrelated I'll ask for a force merge, thanks for your work!

Signed-off-by: Conroy Cheers <[email protected]>

Signed-off-by: Conroy Cheers <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

Signed-off-by: Conroy Cheers <[email protected]>

conroy-cheers requested review from mgoin, robertgshaw2-neuralmagic, tlrmchlsmth and WoosukKwon as code owners October 27, 2024 13:21

conroy-cheers force-pushed the fix-nvml-jetson-support branch 4 times, most recently from e41c937 to da79e3e Compare October 27, 2024 13:29

This was referenced Oct 27, 2024

[Installation]: vllm install error in jetson agx orin #7575

Open

[Installation]: How to install vLLM on Jetson #8485

Open

[Installation]: vllm on NVIDIA jetson AGX orin #5640

Open

DarkLight1337 requested a review from youkaichao October 28, 2024 07:53

conroy-cheers force-pushed the fix-nvml-jetson-support branch from 400c3d1 to ef7ff3c Compare October 29, 2024 02:58

tlrmchlsmth reviewed Oct 29, 2024

View reviewed changes

mergify bot added the ci/build label Oct 29, 2024

conroy-cheers force-pushed the fix-nvml-jetson-support branch 4 times, most recently from 51085cf to c93ecc4 Compare October 29, 2024 23:50

youkaichao reviewed Nov 19, 2024

View reviewed changes

mergify bot added the needs-rebase label Nov 21, 2024

CudaPlatform: roll context abstraction into platforms

931dddd

Signed-off-by: Conroy Cheers <[email protected]>

youkaichao approved these changes Nov 25, 2024

View reviewed changes

mgoin approved these changes Nov 26, 2024

View reviewed changes

mergify bot removed the needs-rebase label Nov 26, 2024

youkaichao reviewed Nov 26, 2024

View reviewed changes

vllm/platforms/cuda.py Outdated Show resolved Hide resolved

youkaichao reviewed Nov 26, 2024

View reviewed changes

vllm/platforms/cuda.py Show resolved Hide resolved

Merge remote-tracking branch 'upstream/main' into fix-nvml-jetson-sup…

ef3253a

…port

conroy-cheers force-pushed the fix-nvml-jetson-support branch from 6644c52 to 040dc12 Compare November 26, 2024 06:05

youkaichao enabled auto-merge (squash) November 26, 2024 06:37

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 26, 2024

NvmlCudaPlatform: remove some internal methods

77daa0f

Signed-off-by: Conroy Cheers <[email protected]>

auto-merge was automatically disabled November 26, 2024 10:59
Head branch was pushed to by a user without write access

conroy-cheers force-pushed the fix-nvml-jetson-support branch from 040dc12 to 77daa0f Compare November 26, 2024 10:59

mgoin mentioned this pull request Nov 26, 2024

support jetson AGX Orin #9109

Closed

youkaichao merged commit f5792c7 into vllm-project:main Nov 26, 2024
69 of 71 checks passed

youkaichao mentioned this pull request Nov 26, 2024

ARM aarch-64 server build failed (host OS: Ubuntu22.04.3) #2021

Closed

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Nov 26, 2024

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (vllm-project#9735)

f095097

Signed-off-by: Conroy Cheers <[email protected]>

afeldman-nm pushed a commit to neuralmagic/vllm that referenced this pull request Dec 2, 2024

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (vllm-project#9735)

cc1e43a

Signed-off-by: Conroy Cheers <[email protected]> Signed-off-by: Andrew Feldman <[email protected]>

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (vllm-project#9735)

36bf87e

Signed-off-by: Conroy Cheers <[email protected]>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (vllm-project#9735)

ea2feaf

Signed-off-by: Conroy Cheers <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735

conroy-cheers commented Oct 27, 2024

github-actions bot commented Oct 27, 2024

youkaichao commented Oct 28, 2024

youkaichao commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

tlrmchlsmth commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

tlrmchlsmth commented Oct 29, 2024

tlrmchlsmth left a comment

youkaichao commented Oct 29, 2024

conroy-cheers commented Oct 29, 2024 •

edited

Loading

conroy-cheers commented Nov 11, 2024 •

edited

Loading

conroy-cheers commented Nov 17, 2024

youkaichao Nov 19, 2024

conroy-cheers Nov 25, 2024

youkaichao Nov 19, 2024

mergify bot commented Nov 21, 2024

youkaichao left a comment

youkaichao commented Nov 25, 2024

mgoin left a comment

conroy-cheers commented Nov 26, 2024 •

edited

Loading

conroy-cheers commented Nov 26, 2024

mgoin commented Nov 26, 2024

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735

[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson #9735

Conversation

conroy-cheers commented Oct 27, 2024

github-actions bot commented Oct 27, 2024

youkaichao commented Oct 28, 2024

youkaichao commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

tlrmchlsmth commented Oct 28, 2024

conroy-cheers commented Oct 28, 2024

tlrmchlsmth commented Oct 29, 2024

tlrmchlsmth left a comment

Choose a reason for hiding this comment

youkaichao commented Oct 29, 2024

conroy-cheers commented Oct 29, 2024 • edited Loading

conroy-cheers commented Nov 11, 2024 • edited Loading

conroy-cheers commented Nov 17, 2024

youkaichao Nov 19, 2024

Choose a reason for hiding this comment

conroy-cheers Nov 25, 2024

Choose a reason for hiding this comment

youkaichao Nov 19, 2024

Choose a reason for hiding this comment

mergify bot commented Nov 21, 2024

youkaichao left a comment

Choose a reason for hiding this comment

youkaichao commented Nov 25, 2024

mgoin left a comment

Choose a reason for hiding this comment

conroy-cheers commented Nov 26, 2024 • edited Loading

conroy-cheers commented Nov 26, 2024

mgoin commented Nov 26, 2024

conroy-cheers commented Oct 29, 2024 •

edited

Loading

conroy-cheers commented Nov 11, 2024 •

edited

Loading

conroy-cheers commented Nov 26, 2024 •

edited

Loading